Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces

نویسندگان

چکیده

Reinforcement learning consists of finding policies that maximize an expected cumulative long-term reward in a Markov decision process with unknown transition probabilities and instantaneous rewards. In this article, we consider the problem such optimal while assuming they are continuous functions belonging to reproducing kernel Hilbert space (RKHS). To learn policy, introduce stochastic policy gradient ascent algorithm following three unique novel features. First, estimates gradients unbiased. Second, variance is reduced by drawing on ideas from numerical differentiation. Four, complexity controlled using sparse RKHS representations. Novel feature, first, instrumental proving convergence stationary point reward. second, facilitates reasonable times. third, necessity practical implementations, which show can be done way does not eliminate guarantees. Numerical examples standard problems illustrate successful low representations, close points

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces

We introduce a functional gradient descent trajectory optimization algorithm for robot motion planning in Reproducing Kernel Hilbert Spaces (RKHSs). Functional gradient algorithms are a popular choice for motion planning in complex many-degree-of-freedom robots, since they (in theory) work by directly optimizing within a space of continuous trajectories to avoid obstacles while maintaining geom...

متن کامل

Stochastic Processes with Sample Paths in Reproducing Kernel Hilbert Spaces

A theorem of M. F. Driscoll says that, under certain restrictions, the probability that a given Gaussian process has its sample paths almost surely in a given reproducing kernel Hilbert space (RKHS) is either 0 or 1. Driscoll also found a necessary and sufficient condition for that probability to be 1. Doing away with Driscoll’s restrictions, R. Fortet generalized his condition and named it nuc...

متن کامل

Real reproducing kernel Hilbert spaces

P (α) = C(α, F (x, y)) = αF (x, x) + 2αF (x, y) + F (x, y)F (y, y), which is ≥ 0. In the case F (x, x) = 0, the fact that P ≥ 0 implies that F (x, y) = 0. In the case F (x, y) 6= 0, P (α) is a quadratic polynomial and because P ≥ 0 it follows that the discriminant of P is ≤ 0: 4F (x, y) − 4 · F (x, x) · F (x, y)F (y, y) ≤ 0. That is, F (x, y) ≤ F (x, y)F (x, x)F (y, y), and this implies that F ...

متن کامل

Distribution Embeddings in Reproducing Kernel Hilbert Spaces

The “kernel trick” is well established as a means of constructing nonlinear algorithms from linear ones, by transferring the linear algorithms to a high dimensional feature space: specifically, a reproducing kernel Hilbert space (RKHS). Recently, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics, by embedding proba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2021

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2020.3029317